## age sex bmi children smoker
## Min. :18.00 female:662 Min. :15.96 Min. :0.000 no :1064
## 1st Qu.:27.00 male :676 1st Qu.:26.30 1st Qu.:0.000 yes: 274
## Median :39.00 Median :30.40 Median :1.000
## Mean :39.21 Mean :30.66 Mean :1.095
## 3rd Qu.:51.00 3rd Qu.:34.69 3rd Qu.:2.000
## Max. :64.00 Max. :53.13 Max. :5.000
## region charges
## northeast:324 Min. : 1122
## northwest:325 1st Qu.: 4740
## southeast:364 Median : 9382
## southwest:325 Mean :13270
## 3rd Qu.:16640
## Max. :63770
Видно, что большего всего между собой коррелируют charges и age, и bmi и charges
## age children bmi charges sex_bin smoker_bin northeast northwest
## 1 19 0 27.900 16884.924 0 1 0 0
## 2 18 1 33.770 1725.552 1 0 0 0
## 3 28 3 33.000 4449.462 1 0 0 0
## 4 33 0 22.705 21984.471 1 0 0 1
## 5 32 0 28.880 3866.855 1 0 0 1
## 6 31 0 25.740 3756.622 0 0 0 0
## southwest southeast
## 1 1 0
## 2 0 1
## 3 0 1
## 4 0 0
## 5 0 0
## 6 0 1
При отображении иерархической кластреризации будем рассматривать наиболее коррелирующее переменные - bmi, age, charges.